Energy Efficient Cache Organizations for Superscalar Processors*
نویسندگان
چکیده
Organizational techniques for reducing energy dissipation in on–chip processor caches as well as off–chip caches have been observed to provide substantial energy savings in a technology independent manner. We propose and evaluate the use of block buffering using multiple block buffers, subbanking and bit line isolation to reduce the power dissipation within on–chip caches for superscalar CPUs. We use a detailed register–level superscalar simulator to glean transition counts that occur within various cache components during the execution of SPEC 95 benchmarks. These transition counts are fed into an energy dissipation model for a 0.8 micron cache to allow power dissipation within various cache components to be estimated accurately. We show that the use of 4 block buffers, with subbanking and bit line isolation can reduce the energy dissipation of conventional caches very significantly, often by as much as 60–70%.
منابع مشابه
Tradeo s in Processor/Memory Interfaces for Superscalar Processors
The current scheme of dealing with data cache misses is not well-suited for superscalar processors. In this scheme, the processor is blocked by holding its clock low until the missing cache block can be fetched from memory and inserted into the cache. From the processor's viewpoint, the miss did not occur. From the user's viewpoint, the execution time was lengthened in direct proportion to the ...
متن کاملCache designs for energy efficiency
Cuches usually consume a significant amount of energy in modern microprocessors (e.g. superpipelined or superscalar processors). In this paper; we examine contemporary cuche design techniques and provide an analytical model for estimating cache energy consumption. We also present several novel techniques for designing an energy efjiciency cache, which include block buffering, cache subbanking, ...
متن کاملExploring the performance of split data cache schemes on superscalar processors and symmetric multiprocessors
Current technology continues providing smaller and faster transistors, so processor architects can offer more complex and functional ILP processors, because manufacturers can fit more transistors on the same chip area. As a consequence, the fraction of chip area reachable in a single clock cycle is dropping, and at the same time the number of transistors on the chip is increasing. However, prob...
متن کاملEfficient Implementation of Nearest Neighbor Classification
An efficient approach to Nearest Neighbor classification is presented, which improves performance by exploiting the ability of superscalar processors to issue multiple instructions per cycle and by using the memory hierarchy adequately. This is accomplished by the use of floating-point arithmetic which outperforms integer arithmetic, and block (tiled) algorithms which exploit the data locality ...
متن کاملSuperscalar GEMM-based Level 3 BLAS - The On-going Evolution of a Portable and High-Performance Library
Recently, a rst version of our GEMM-based level 3 BLAS for superscalar type processors was announced. A new feature is the inclusion of DGEMM itself. This DGEMM routine contains inline what we call a level 3 kernel routine, which is based on register blocking. Additionally, it features level 1 cache blocking and data copying of sub-matrix operands for the level 3 kernel. Our other BLAS's which ...
متن کامل